A NOVEL FPGA ARCHITECTURE FOR A RECONFIGURABLE PROCESSOR

1. INTRODUCTION:

Until recently, FPGA has only been used in prototyping of ASIC designs and low-volume production, mostly because of its low speed, high per unit cost and high power consumption. However, thanks to the improvements of FPGA technology, soaring non-recurring engineering (NRE) cost and shortening time-to-market requirements, there is an increasing interest in using FPGAs instead of ASICS for embedded systems design [1]. So far, most FPGA designs follow the traditional ASIC design flow, confining the reconfigurability to load time. However, run-time or dynamic reconfiguration is of special interest among the research community because it provides a performance/cost advantage over load-time configuration [2]. As a programmable platform, a dynamically reconfigurable architecture only makes sense when it provides a better solution than other alternatives, e.g., super scalar processor and DSP, in terms of performance, cost, power and development efforts. Considering ever-increasing performances of processors, a good design methodology is essential to the success of this approach.

There are two primary methods in conventional computing for the execution of algorithms. The first is to use hardwired technology, either an Application Specific Integrated Circuit (ASIC) or a group of individual components forming a board-level solution, to perform the operations in hardware. ASICs are designed specifically to perform a given computation, and thus they are very fast and efficient when executing the exact computation for which they were designed. However, the circuit cannot be altered after fabrication. This forces a redesign and refabrication of the chip if any part of its circuit requires modification. This is an expensive process, especially when one considers the difficulties in replacing ASICs in a large number of deployed systems. Board-level circuits are also somewhat in- flexible, frequently requiring a board redesign and replacement in the event of changes to the application.

The second method is to use software- programmed microprocessors—a far more flexible solution [10]. Processors execute a set of instructions to perform a computation. By changing the software instructions, the functionality of the system is altered without changing the hardware. However, the downside of this flexibility is that the performance can suffer, if not in clock speed then in work rate, and is far below that of an ASIC. The processor must read each instruction from memory, decode its meaning, and only then execute it. This results in a high execution overhead for each individual operation. Additionally, the set of instructions that may be used by a program is determined at the fabrication time of the processor. Any other operations that are to be implemented must be built out of existing instructions.

Reconfigurable computing is intended to fill the gap between hardware and software, achieving potentially much higher performance than software, while maintaining a higher level of flexibility than hardware. Reconfigurable devices, including field-programmable gate arrays (FPGAs), contain an array of computational elements whose functionality is determined through multiple programmable configuration bits. These elements, sometimes known as logic blocks, are connected using a set of routing resources that are also programmable. In this way, custom digital circuits can be mapped to the recon- figurable hardware by computing the logic functions of the circuit within the logic blocks, and using the configurable routing to connect the blocks together to form the necessary circuit.

In this paper we present a design flow for dynamically reconfigurable systems, which is a compromise between performance concerns and minimizing design efforts [9]. In section 2, some important ideas of this flow are discussed: Firstly, it is a C based design flow. Choosing a software language as design input is not only helpful in reducing development costs, but it is also suited for dynamic reconfiguration. Secondly, the design exploration can be done at C level, thus allowing performance improvements and reduced FPGA area with a minimum of design efforts.

2. Design Issues:

2.1 Target Architecture

One important observation of typical embedded applications shows that 80% or more execution time is spent on 20% or less of the code. If all code is to be placed on an FPGA, even thanks to dynamic reconfiguration, it is unlikely to generate an efficient implementation in terms of performance/ cost ratio. To address this concern, our architecture, similar to Garp, consists of a tightly coupled processor and FPGA (Fig. 1). The FPGA is used as a reconfigurable hardware accelerator, while the processor takes care of all other tasks, including FPGA configuration.

The processor can be an embedded RISC like ARM or MIPS. Here we use an open-source processor LEON, which is compatible with SPARC V8 and has similar performance as other RISCs. The main advantage of an open-source core is that it provides design flexibility, e.g., allowing us to define the interface between the FPGA and the processor. The FPGA is expected to be a small cost-conscious device and fast configuration is definitely a desirable feature. A Xilinx Virtex XCV100E has been used for this purpose. The FPGA has direct access to the main memory. It makes porting or compiling C code to the FPGA easier than other communication models.

Figure 1: Target Architecture of DRES

2.2 C-based Design Flow

Figure 2: Design Flow of Dynamically Reconfigurable Systems

3. RUN TIME CONFIGURATION:

Frequently, the areas of a program that can be accelerated through the use of reconfigurable hardware are too numerous or complex to be loaded simultaneously onto the available hardware. For these cases, it is beneficial to be able to swap different configurations in and out of the reconfigurable hardware as they are needed during program execution (Figure 3). This concept is known as runtime reconfiguration (RTC).

Fig. 3. Applications which are too large to entirely fit on the reconfigurable hardware can be partitioned into two or more smaller configurations that can occupy the hardware at different times.

Run-time reconfiguration is based upon the concept of virtual hardware, which is similar to virtual memory. Here, the physical hardware is much smaller than the sum of the resources required by each of the configurations. Therefore, instead of reducing the number of configurations that are mapped, we instead swap them in and out of the actual hardware as they are needed. Because run-time reconfiguration allows more sections of an application to be mapped into hardware than can be fit in a non-run-time reconfigurable system, a greater portion of the program can be accelerated. This provides potential for an overall improvement in performance.

4. TEST RESULTS:

In this Paper we presented only how the reconfigurability is achieved using ATMEL Processor Kit. First the C based design check whether the received input is integer or floating point numbers. If the received number is integer it will select the normal ALU unit not coprocessor unit which will perform the floating point operation. So that we can avoid unnecessary Exponent and decimal point alignment and hence we can increase the speed of the processor which is very important in the case of real time application. Here we have passed the integer number through the normal ALU unit and floating-point coprocessor unit. In the following table1 shows comparison between the normal ALU and the coprocessor unit ALU for integer number Multiplication.

We tested this algorithm in four device family namely Xilinx, Spartan, Vertex and Cool Runner (Table1)

Table 1. Comparison between Various FPGA Device Families for Multiplication

It has been observed that VERTEX-E FPGA family is suitable for integer number multiplication operation. Also it shows that if we pass the same integer through the Floating point ALU, the VERTEX-E FPGA family takes large Memory Size (109944 KB) and Time (75.87ns) due to pre- Normalizing and Post- Normalizing.

In this same manner, with the help of C based design flow first we will check the application and load that particular application into FPGA to increase the speed of processor.

5. CONCLUSION:

Reconfigurable computing is becoming an important part of research in computer architectures and software systems. By placing the computationally intense portions of an application onto the reconfigurable hardware, that application can be greatly accelerated. This is because reconfigurable computing combines many of the benefits of both software and ASIC implementations. Like software, the mapped circuit is flexible, and can be changed over the lifetime of the system or even the lifetime of the application. Similar to an ASIC, reconfigurable systems provide a method to map circuits into hardware. Reconfigurable systems therefore have the potential to achieve far greater performance than software as a result of bypassing the fetch-decode-execute cycle of traditional microprocessors as well as possibly exploiting a greater degree of parallelism.

Reconfigurable hardware systems come in many forms, from a configurable functional unit integrated directly into a CPU, to a reconfigurable coprocessor coupled with a host microprocessor, to a multi- PGA stand-alone unit. The level of coupling, granularity of computation structures, and form of routing resources are all key points in the design of reconfigurable systems. The use of heterogeneous structures can also greatly add to the overall performance of the final design.

Compilation tools for reconfigurable systems range from simple tools that aid in the manual design and placement of circuits, to fully automatic design suites that use program code written in a high level language to generate circuits and the controlling software. The variety of tools available allows designers to choose between manual and automatic circuit creation for any or all of the design steps. Although automatic tools greatly simplify the design process, manual creation is still important for performance-driven applications. Circuit libraries and circuit generators are additional software tools that enable designers to quickly create efficient designs. These tools attempt to aid the designer in gaining the benefits of manual design without entirely sacrificing the ease of automatic circuit creation.

Finally, run-time reconfiguration provides a method to accelerate a greater portion of a given application by allowing the configuration of the hardware to change over time. Apart from the benefits of added capacity through the use of virtual hardware, run-time reconfiguration also allows for circuits to be optimized based on runtime conditions. In this manner, performance of a reconfigurable system can approach or even surpass that of an ASIC.

6. REFERENCE:

[1] A. A. Aggarwal, D. M. Lewis, “Routing Architectures for Hierarchical Field Programmable Gate Arrays”, International Conference on Computer Design, pp. 475-478, 1994. [Annapolis97] “Firefly”, Annapolis, MD: Annapolis Micro Systems, 1997.

[2] V. Betz, J. Rose, “Using Architectural “Families” to Increase FPGA Speed and Density”, ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pp. 10-16, 1995. [Bolotski94] M. Bolotski, A. DeHon, T. F. Knight, “Unifying FPGAs and SIMD Arrays”, ACM/SIGDA Workshop on Field Programmable Gate Arrays, 1994.

[3] V. C. Chan, D. M. Lewis, “Area-Speed Tradeoffs for Hierarchical Field-Programmable Gate Arrays”, ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pp. 51-57, 1996.

[4] S. Churcher, T. Kean, B. Wilkie, “The XC6200 FastMapTM Processor Interface”, Field-Programmable Logic and Applications, pp. 36-43, 1995.

[5] A. DeHon, “Entropy, Counting, and Programmable Interconnect”, CM/SIGDA Symposium on Field Programmable Gate Arrays, pp.73-79, 1996.

[6] C. Ebeling, D. C. Green, P. Franklin, “RaPiD – Reconfigurable Pipelined Datapath”, International Workshop on Field-Programmable Logic and Applications, pp. 126-135, 1996.

[7] S. Hauck, T. W. Fry, M. M. Hosler, J. P. Kao, “The Chimaera Reconfigurable Functional Unit”, IEEE Symposium on FPGAs for Custom Computing Machines, pp. 87-96, 1997.

[8] S. Hauck, “Configuration Prefetch for Single Context Reconfigurable Coprocessors”,ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 65-74, 1998.

[9] S. Hauck, “The Roles of FPGAs in Reprogrammable Systems”, Proceedings of the IEEE, Vol. 86, No. 4, pp. 615-639, April 1998.

[10] S. Hauck, A. Agarwal, “Software Technologies for Reconfigurable Systems”, submitted to IEEE Transactions on VLSI Systems, 1998.

[11] S. Hauck, Z. Li, “Don’t Care Discovery for FPGA Configuration Compression”, submitted to International Conference on Computer-Aided Design, 1998.

[12] S. Hauck, Z. Li, E. Schwabe, “Configuration Compression for the Xilinx XC6200 FPGA”, IEEE Symposium on FPGAs for Custom Computing Machines, 1998.

[13] K. Keutzer, “Challenges in CAD for the One Million Gate FPGA”, ACM/SIGDA Symposium on FieldProgrammable Gate Arrays, pp. 133-134, 1997.

[14] P. Lysaght, J. Dunlop, “Dynamic Reconfiguration of FPGAs”, in W. R. Moore, W. Luk, Eds., More FPGAs, Oxford, England: Abingdon EE&CS Books, pp. 82-94, 1994.

[15] E. Mirsky, A. DeHon, “MATRIX: A Reconfigurable Computing Architecture with Configurable Instruction Distribution and Deployable Resources”, IEEE Symposium on FPGAs for Custom Computing Machines, pp. 157-166, 1996.

[16] C. A. Moritz, D. Yeung, A. Agarwal, “Exploring Optimal Cost-Performance Designs for RAW Microprocessors”, IEEE Symposium on Field-Programmable Custom Computing Machines, 1998.

Technical College - Bourgas,

A NOVEL FPGA ARCHITECTURE FOR A RECONFIGURABLE ALU

ABSTRACT

2. Design Issues: